Self-Correlation and Maximum Independence 
in Finite Relations 


Dilian Gurov 

KTH Royal Institute of Technology, Stockholm, Sweden 
dilianScsc.kth.se 


Minko Markov 

“St. Kliment Ohridski” University of Sofia, Sofia, Bulgaria 
minkomSfmi.uni-sofia.bg 


We consider relations with no order on their atttibutes as in Database Theory. An independent par¬ 
tition of the set of attributes S of a finite relation R is any partition X of S such that the join of 
the projections of R over the elements of X yields R. Identifying independent partitions has many 
applications and corresponds conceptually to revealing orthogonality between sets of dimensions in 
multidimensional point spaces. A subset of S is termed self-correlated if there is a value of each 
of its attributes such that no tuple of R contains all those values. This paper uncovers a connection 
between independence and self-correlation, showing that the maximum independent partition is the 
least fixed point of a certain inflationary transformer a that operates on the finite lattice of partitions 
of S. a is defined via the minimal self-correlated subsets of S. We use some additional properties 
of a to show the said fixed point is still the limit of the standard approximation sequence, just as in 
Kleene’s well-known fixed point theorem for continuous functions. 

1 Introduction 

The problem of discovering independence between sets of points in a multidimensional space is a fun¬ 
damental problem in science. It arises naturally in many areas of Computer Science. For instance, with 
respect to relational data, discovering such independence allows exponential gains in storage space and 
processing of information lITTIl . [JJ, and can facilitate the problem of machine learning |[T3ll . With respect 
to problem clusterisation of multidimensional relational data, finding independence helps finding the de¬ 
sired clusters ||5l, [8^1. Decomposing data into smaller units that are independent except at their interfaces 
has been known to be essential for understanding large legacy systems ifTTI . Independence has also been 
the subject of recent works in logic, giving rise to so-called logics of dependence and independence [4]. 

The concrete motivation for the present work derives from the area of software product line engi¬ 
neering, a discipline that aims at planning for and developing a family of products through managed 
reuse in order to decrease time to market and improve software quality ifT^ . A software family can be 
modelled as a relation whose attributes are the software’s functionalities. The various implementations 
of each functionality in the form of software artefacts are the attributes’ values. The individual products 
of a family are thus modelled as the tuples of that relation over the attributes. In previous works ll^lTSl 
we considered a restricted class of software families called simple families (later on we changed the term 
“families” to the more abstract term “relations”), where discovery of independence and a compositional 
model checking technique are utilised to derive a divide-and-conquer verification strategy. Simple rela¬ 
tions constitute the least class that contains the single-attribute, single-value relations and is closed under 
join of relations with disjoint attribute sets and unions of relations over the same set of attribute names 
but with disjoint value sets. In the present work we generalise these previous results to discovering in¬ 
dependence in arbitrary relations. We investigate decompositions of a relation R with disjoint attributes 
such that R equals the join of the component relations. Every decomposition is represented by a partition 
of the set of attributes of R. Such partitions are termed independent partitions. 

R. Matthes, M. Mio (Eds.): Fixed Points 

in Computer Science 2015 (FICS 2015) © D. Gurov, M. Markov 

EPTCS 191, 2015, pp. fiO-lIll doi: 10.4204/EPTCS.191.7 


D. Gurov, M. Markov 


61 


The problem of computing a maximum decomposition of this kind has previously been studied 
in ITOl . where it is referred to as prime factorisation, and an efficient algorithmic solution is proposed. In 
this paper we investigate an alternative approach that works purely on the level of the attributes of R and 
is based on the concept of correlation between attributes. We have discovered a nontrivial connection 
between independence and correlation and the major goal of this paper is to demonstrate that connection. 

A first observation is that the decomposition problem cannot be solved purely based on analysis 
of pairs of attributes. In the aforementioned work @ we compute dependence (or independence) in 
simple relations by computing correlation between pairs of attributes. That approach does not generalise 
for arbitrary relations as we show in this paper. Our solution is to introduce self-correlation of sets 
(of arbitrary cardinality) of attributes. In other words, the current notion of correlation is a hypergraph 
whose hyperedges are the self-correlated sets, rather than an ordinary graph as were the case with the 
simple relations. Since self-correlated sets are upward closed under set inclusion (Proposition lljl, the 
minimal self-correlated sets, or the mincors (Definition |4l), are the foundation of our analysis. A second 
observation is that mincors do not cross independent partitions (LemmaHl), hence one can safely merge 
overlapping mincors to compute the maximum independent partition. In the case of simple relations 
that merger indeed yields the maximum independent partition |i6] but in arbitrary relations merging the 
mincors does not necessarily output an independent partition, as the example on page shows. We 
overcome this hindrance with the help of a final important insight. Let X be the partition of the set of 
attributes that results from merging overlapping mincors. The relation can be factored on X, producing a 
quotient relation. In other words, the elements of X are considered atomic now; the subsets of X may or 
may not be self-correlated in their turn, and the said quotient relation is defined via those new mincors. 
We show that the procedure of identifying mincors and merging overlapping ones can be repeated on this 
quotient relation and this can be iterated until stabilisation, yielding the desired maximum independent 
partition. 

The above insights suggest that relational decomposition can be presented in terms of a transformer 
over the finite lattice of quotient relations, or conceptually even simpler, over the lattice of the partitions 
ordered by refinement, inducing the former lattice. The transformer a on partitions introduced here 
essentially corresponds to identifying the mincors of the quotient relation induced by a partition, merging 
the overlapping ones, and extracting from the result the corresponding partition (Definition[5]l. We prove 
that the independent partitions correspond exactly to the fixed points of a (Theorem [U. 

If a is monotone, one can utilise two well-known fixed point theorems on complete lattices (having 
in mind that monotone functions over finite lattices are continuous). First, by Tarski’s fixed point theorem 
for complete lattices ifThl . the set of fixed points forms a lattice itself with respect to the same ordering, 
hence there is a unique least fixed point (LFP), which in our case would be precisely the maximum 
independent partitioning that we are after. And second, one can utilise Kleene’s fixed point theorem fT], 
to the effect that the LFP can be computed iteratively, starting from the bottom of the lattice, i.e. the 
partition into singletons, and applying a until stabilisation, i.e., until the fixed point is reached. It turns 
out, however, that a in general is not monotone as demonstrated by the example on page|70]and therefore 
the above reasoning is not applicable. 

On the other hand, we show that a is inflationary (Proposition ID). The existence of a LFP is estab¬ 
lished by showing that there exists a fixed point and the set of all fixed points is closed under intersection 
(LemmaO. Furthermore, the downward closure of LFP, i.e., the set of all partitions refining it, is closed 
under a (Lemma [Hi. Since the lattice is finite, these results give rise to a modified version of Kleene’s 
fixed point theorem—formulated in terms of inflationary transformers rather than monotone ones (The¬ 
orem |2l) —justifying the same iterative fixed point computation procedure (Corollary [H. The proposed 
characterisation reduces relational decomposition to the problem of identifying the mincors of a relation. 
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Organisation The paper is organised as follows. Section |2] recalls some known notions and results 
about sets and families, partitions, lattices, fixed points, relations, attributes, and relation schemes, quo¬ 
tient relations, and defines independent partitions of the attributes set. Section|3] develops the theory of 
self-correlated sets in quotient relations and how they relate w.r.t. partition abstraction. Section|4]presents 
many useful lemmas that concern independence. Section |5] defines the transformer a and contains our 
main result, Theorem|2] Section [^discusses what we currently know about the area of decomposition of 
relations, also called factorisation of relations, and compares the approach and the results of this paper 
with similar works. The final Section |7] draws some conclusions and outlines directions for future work. 

2 Background 

In this section we recall some standard set-theoretical notions and notation needed for our theoretical 
developments. 

2.1 Sets, covers, and partitions 

In this work we consider only finite sets. The powerset of a set A is denoted by POW(A) and P^(A) denotes 
P0W(A) \ {0}. Ground sets are nonempty sets over which we construct the families that are our subject 
of research. 

Let A be a ground set. A family over A is any nonempty subset of P^(A). A family F is Sperner 
family if VX,F € F : X ^ F. F is connected if VX,Z gF: XnZy^0orF has elements Fi, F 2 , ..., F^ 
for some k > 1 , such that X n Fi 7 ^ 0 , YjO Yi+i 7 ^ 0 for 1 < / < k — 1, and F/t nZ 7 ^ 0. A connected 
component of a family is any maximal connected subfamily in it. We use CC(F) to denote the family 
{UB |B is a connected component of F}. A superfamily over A is any nonempty subset of P^(P^(A)). 

Suppose A is a set. A cover of A is any family F over A such that UF = A. The set of all covers 
of A is denoted by K{A). If X G X(A) and FnZ = 0 for all distinct Y,Z ^ X, we say X is a partition 
of A. If |X| = I the partition is trivial and if |X| = |A| the partition is partition into singletons. Note that 
CC(F) defined above is a partition of the ground set. We denote by ^ X the fact that for some B G A, 
is a family over B such that every element of is a subset of precisely one element of X and every 
element of X is a superset of at most one element of Xj. For example, if A = {a,b,c,d,e,f,g,h,k} then 
{{a,b},{c},{d,ej,g},{h,k}}. 

The set of all partitions of A is denoted by n(A). For any Pi,P 2 G n(A), Pi refines P 2 , which we 
denote by Pi jT P 2 , if 

VXGPi 3 FgP2:XCF 

Conversely, we say that P 2 abstracts Pi. If Pi jG P 2 and Pi 7 ^ P 2 we write Pi C P 2 . 

2.2 Partial orders, lattices, and chains 

We denote generic partial orders by If (A, is a poset, a least element of A is any x G A such that 
Vy G A : X ^ y and a greatest element of A is any x G A such that Vy G A : y ^ x. A least element may not 
exist but if it exists it is unique; the same holds for a greatest element. The least element is called bottom 
and is denoted by X. The greatest element is called top and is denoted by T. A chain in a poset (A, is 
any P C A such that \/x,y € B : x ^yWy ^ x. 

A lattice is a poset (A, ^), shortly A when ^ is understood, such that for any x,y € A there exists a 
(unique) greatest lower bound in A called meet and denoted by x □ y and a (unique) least upper bound 
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in A called join and denoted by ;cU Collectively, n and U are the lattice operations of A. They are 
commutative and associative IH pp. 8 ]. We generalise the lattice operations on subsets of A in the obvious 
way. A complete lattice is a lattice such that every B QA has a meet riB and a join UB. In particular, 
A has a meet riA = _L and a join UA = T. Every finite lattice is complete |[3l pp. 46], therefore from now 
on by lattice we mean complete lattice. For any x ^ A, the sets {y € A | y ^ .r} and {y € A | a: ^ y} are 
called down-x and up-x and are denoted by fx and jjc, respectively [Si pp. 20 ]. 

It is well-known that (n(A),C) is a lattice. Furthermore, _L is the partition into singletons, T is 
the trivial partition, and for any Pi,P 2 G n(A), Pi nP 2 = {2f ClF |X G Pi,F G P 2 } \ {0} and Pi UP 2 = 
CC(Pi UP 2 ) (see lEl PP- 15]). We extend the “n” notation to subsets of partitions: for any X, G n(A), 
for any nonempty X' C X and any nonempty fifi such that X' n / 0, X' n denotes the set 
{PnC|PGX',CG2)'}\{0}. 

2.3 Functions and fixed points 

Suppose A is a set and / : A —A is a function. For every x G A: f^{x) *= x and for every n G N+, 

/"(x) [x). For every n G N, /”(x) is the n-th iterate of f. A fixed point of / is every x G A such 

that f{x) = X. Fet (A, be a poset. A function / : A —)• A is monotone if Vx,y G A : x ^ y —)• f{x) ^ /(y) 
and / is inflationary if Vx G A : x ^ f{x) |[T4l pp. 263]. 

A well-known fixed poinf fheorem is Tarski’s fixed poinf fheorem for confinuous funcfions over 
complefe laffices itT^ . sfafing fhaf fhe sef of fixed poinfs is non-empfy and forms a laffice ifself wifh 
respecf fo fhe same ordering, and hence fhe function has a unique least fixed point (FFP). Anofher well- 
known fheorem due fo Kleene sfafes fhe exisfence of an FFP for confinuous funcfions on chain-complefe 
partial orders fTl, and fhaf fhe FFP can be computed iteratively, sfarting from fhe boflom of fhe laffice 
and applying fhe funcfion unfil sfabilisafion. 

2.4 Schemes, relations, and quotient relations 

The following definitions are close fo fhe ones in f9l- A scheme is a nonempty sef S = {Ai,...,A„} 
whose elemenfs, called the attributes, are nonempfy sefs. For every atfribufe, ifs elemenfs are said fo be 
ifs values. A relation over S is a nonempfy sef of fofal funcfions {f , t 2 , ■ • •, which we call the tuples, 
such fhaf for 1 < 7 < p, f; : S —)• US, wifh fhe resfricfion fhaf t;(A,) G A,, for I <i <n. We assume fhaf 
every value of every atfribufe occurs in af leasf one fuple. 

The relafions we have in mind are as in Relational Dafabase Theory, i.e. wifh unordered fuples, rafher 
fhan as in Sef Theory, i.e. wifh ordered fuples. 

We furfher posfulafe fhaf fhe said affribufes are mufually disjoinf sefs. Thaf allows a simplificafion of 
fhe definilion of relation: a relafion over S is nonempty sef of fuples, each fuple being an n-elemenf sef 
wifh precisely one elemenf from every affribufe. To save space, we oflen wrife fhe fuples wifhouf commas 
befween fheir elemenfs. For example, lef n = 3, Ai = { 01 , 02 }, A 2 = {(^1 ,^ 2 }. and A 3 = {ci,C 2 ,C 3 }. One 
of fhe relafions over fhe scheme {Ai,A 2 ,A 3 } is written as {{a\b\Ci},{a\b 2 C 2 },{a 2 b 2 Cj,}}. 

Fef Si,S 2 ,... ,S,t be schemes such fhaf for 1 < / < 7 < k, VA G S, VB G S / : APlB = 0. Fef /?,• be a 
relafion over S,, for 1 < / < k. The join ofR\, ..., Rkls fhe relafion 


IXI R2 X ••• X = {G{xi,X2,...,Xk}\xi GRi,X2 GR 2 ,---A/t ^Rk) 


The complete relation over S = {Ai,... ,A„} is {{x} |x G A;}. Clearly, ifs cardinality is HLi 1"^' 
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Let S = {Ai,...,A„} be a scheme. A subscheme ofS is any nonempty subset of S. The notation 
stands for the restriction of / to Z, for any function f :X and any Z QX. Let /? = {ti, t 2 ) • • • Ap} be 
a relation over S and let T be a subscheme of S. The projection ofR on T is /? f T = {tj\j : 1 < ; < p}- 

Definition 1 (quotient relation) Let R be a relation over some scheme S. For any X = {Xi,X 2 ,..., 
X„} € n(S), R/X {R \ X,) is the following relation: 

V{yiy 2 -.An} G tX,) : 

{>’13'2 • • An} G R/X iff 3t G V/i<k„ {t \ X; = y,-) 

We term R/xthe quotient relation of R relative to X. When X is understood we say simply the quotient 
relation of R. 

We emphasise the quotient relation is not over S but over a partition of S. 

Here is an example of a quotient relation. Let S = {A,B,C,D}, let each attribute have precisely two 
values, say A = { 01 , 02 } and so on, let X\ = {{A,B}, {C,D}}, let X 2 = {{A}, {B}, {C}, {D}}, and let 

R' = {{a\biC\d\},{a\b\C2d2}.,{a\b2C\d2}.,{a2b2C\di},{a2b2C2d2}} (1) 

be a relation over S. Then 

R'/Xi = {{{ai,bi}{ci,di}},{{ai,bi}{c2,d2}},{{ai,b2}{ci,d2}}., 

{{ 02 ,(> 2 }{ci,<ii}},{{ 02 ,(> 2 }{c 2 ,A 2 }}} ( 2 ) 

R'/X 2 = {{{ai}{bi}{c\}{di}},{{ai}{bi}{c2}{d2}},{{ai}{b2}{ci}{d2}}, 

{{ 02 }{^ 2 }{ci}{Ai}},{{ 02 }{^ 2 }{c 2 }{A 2 }}} (3) 

A quotient relation is but a grouping together of the tuples of the original relation into subtuples according 
to the partition. It trivially follows that |^/j£| = \R\ for any relation R over any attribute set S and any 

XGn(s). 

2.5 Independent partitions 

For a given relation R over some scheme S, we are after decompositions of R such that R equals the join 
of the obtained components. Each decomposition of this kind corresponds to a certain partition of S. 
Definition 2 (independent partition) Let R be a relation over some scheme S. For any X G n(S), X is 

an independent partition of S with respect to R if R = IX f T. The set of all independent partitions 

reX 

of S with respect to R is denoted by inp(S), or shortly in(S) if R is understood. If a partition is not 
independent, it is dependent. 

Note that in(S) is nonempty since it necessarily contains the trivial partition. 

Proposition 1 For every independent partition X, R/X is the complete relation over X. 

Informally speaking, the object of the present study is the independent partition with the maximum 
number of equivalence classes, provided it is unique. 

3 Correlation in Relations 

In this section we define correlation in relations and quotient relations. From now on assume an arbitrary 
but fixed scheme S and relation R over if. 
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3.1 Correlated subsets of ground sets 

In this subsection, the ground sets are schemes. 

Definition 3 (correlated subsets of schemes) Let S = {Ai, A 2 ,... ,A„} and let T be some nonempty sub¬ 
scheme {A,j ,A,- 2 , ... ,Aj^} where \ < i\ < h < ■ ■ ■ < im <n. T is self-correlated with respect to R, or 
shortly correlated with respect to R, iff 

3xi € A,-, 3x2 € A ,2 • • • 3x„, € A,„ : {xiX 2 ■■■Xm}^R\T (4) 

'We denote that fact by corr^iff) or corr iff) ifR is understood. The opposite concept is uncorrelated. The 
family {T C A | corrs(T)}, in case it is nonempty, is called the correlation family of R. 

Note that no minimal correlated subset is a singleton. The following result re-states correlation of a 
subscheme in terms of the projection of the relation on it. 

Lemma 1 Let T C S. Then corr{T) iffR j T C ixi \ {^}- 

Proof: First assume corr(T). By Definition|3l there is an element in every attribute from T such that the 
tuple of those elements does not occur in /? f T. On the other hand, the tuples of xi x&rR f {X} are all 
possible combinations of the elements of the attributes in T. Therefore, (T C x t {^}- 

In the other direction, assume -icorr(T). The negation of expression dUl in Definition [3] is but another 
way to write ( T = x xex^ f 

As the next result establishes, with respect to the poset (S,C), every correlated subset is upward 
closed, while every uncorrelated subset is downward closed. 

Proposition 2 If corriff) for some T C S then VZjczcs • corr{Z). If -'Corr(T) for some T C S then 
VZzcx : -'Corr(Z). 

It is obvious that the correlation family, if it exists, is a cover of the scheme. Furthermore, it does not 
exist iff the relation is complete. The interesting part of a correlation family is the sub-family comprising 
the minimal correlated sets. However, that sub-family does not necessarily cover the scheme. We want 
to define a family that both covers the scheme—because we are ultimately interested in a partition of the 
scheme—and is a Sperner family, since the implied members of the family are of no interest. 

Definition 4 (mincor family) A mincor of R is every minimal, self-correlated with respect to R, sub- 

def def 

scheme T C S. Further, mincors(/?) = {T C S |T A a mincor} and singleton5(7?) = {{A} |A € S A -iBX G 
mincors(/?) : A G X}. The mincor family ofR, denoted by MF(/?), is MF(/?) = mincors(/?) U 5 ingletons(/?). 

For example, consider R' defined in ([T) on the facing page. Clearly, corrR/({A,B}) and corr^/({C,D}) 
because of the lacks of both 02 and b\ in any tuple and the lack of both C 2 and d\ in any tuple, respectively. 
The other four two-element subsets of S are uncorrelated. Then 5ingletons(7?') = 0 and therefore MF(/?') = 
{{A,B},{C,D}}. 

Proposition 3 With respect to S and R, MF(/?) exists and is unique. 

If R is complete then MF(/?) consists of singletons. Clearly, MF(/?) G K{S), and thus CC(MF(/?)) G 11(5'). 

3.2 Correlation in quotient relations 

The following result establishes an important connection between self-correlation in a partition of the 
scheme and self-correlation in the scheme itself. More specifically, Lemma|2]is used to prove Lemma|3l 
and the latter is used in the proof of Lemma|7]on pagelTT] 
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Lemma 2 For any X G n(S) and X' C X: 
corrRj^lX') corrR{\JX') 

Proof: Assume corri;/^(j£'). Let X' = {Li ,F2, • ■ • ,F„j}. So, {R/X) \ X! does not contain some m-tuple 
{U\,U2-, - ■ ■ -lUm} such that Ui ^R\ Yi for 1 < / < m. Then R \ \JX' does not contain U{[/i, [/2, ■ • ■ j t^m}- 
In the other direction, assume corrR(Uj£') where Uj£' is a subset S' of S. Let S' = {Ai,A2,... ,A„}. 
That is, R \ S' does not contain some n-tuple {W\,W2, ■ ■ ■ ,Wn} such that W,- € A,- for !</<«. Let 
X' = {Y\ ,Y2, ... jYm} ■ Then {R/X) \ X' does not contain the m-tuple {U\,U2,... ,Um} where Ui € R \ F,- 
for 1 < / < m. □ 

As an example that illustrates LemmaH consider R' and X\ on page[Ml Clearly, Xi = {{A,B}, {C,D}} is 
self-correlated with respect to ^/Xi as^/Xi does not contain, among others, the tuple {{a\,b\}{c\,d2}}. 
That implies UXi = {A,B,C,D} is self-correlated with respect to R-. since {{ai,^i}{ci,<i2}} is not an 
element of R/X\, it niust be the case that {a\b\C\d2} is not element of R (and indeed it is not). In the 
other direction, the fact that {a\b\C\d2} 0.^ implies {{a\,b\}{c\,d2}} ^RIX\. 

The next result establishes that for every mincor Y of a quotient relation there is a way to pick elements 
from every element of Y such that the collection of those elements is a mincor of the original relation R. 

Lemma 3 VX € n(S) MX) € mincors(/?/x) 3 Z d : |Z| = A UZ € mincors(/?). 

Proof: Assume ^ G mincors(^/x)- Clearly, there is some Z d such that UZ is correlated with respect 
to R because d is reflexive and \JX) is correlated with respect to R by Lemma| 2 ] Now consider any Z'd 2 ) 
such that |Z'| < \^\. There exists some Xj' such that Z d But is uncorrelated with respect 
to R/X because is a mincor of R jx and so every proper subset of X) is uncorrelated with respect to 
R/X. Note that X)' being uncorrelated with respect to ^/x implies UZ' is uncorrelated with respect to R 
by Lemma[ 2 l It follows that for any Z d such that corrR(UZ)—and we established such a Z exists—it 
is the case that |Z| = |ip|. 

So, there exists aZ <^X) such that |Z| = |i 5 | and UZ is correlated with respect to R. Furthermore, 
there does not exist Z <^X) such that |Z| < |ip| and UZ is correlated with respect to R. Consider any 
Z d 2 } such that UZ is correlated with respect to /?. As |Z| = |i 5 |, every element of 5 ^ is a superset of 
precisely one element of Z. 

First assume all elements of Z are singletons. In this case no proper subset of UZ is correlated with 
respect to R. Suppose the contrary, namely that some W C UZ is correlated with respect to R and deduce 
there is some Z" d ^ such that W = UZ", thus |Z"| < |i 5 |, such that UZ" is correlated with respect to R. 
Since no proper subset of UZ is correlated with respect to R, UZ is a mincor with respect to R and we are 
done with the proof. 

Now assume not all elements of Z are singletons. It trivially follows there exists a minimal set Z d Z 
such that |Z| = |Z| (thus |Z| = |i^|) such that UZ is correlated with respect to R. □ 

4 Results on Independent Partitions 

This section provides important auxiliary results concerning independent partitions. In subsection 14.11 
we investigate the connection between independence and self-correlation. In subsection I 4 . 2 l we prove the 
meet of independent partitions is an independent partition. 
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4.1 Independence and the mincor family 

The following lemma establishes that partition independence is preserved under removal of attributes. 
Lemma 4 € in(S) VX ^ : X € inRfux(UX). 

Proof: Let Q = R \ UX. We prove that Q= x [Q \ Z). In one direction, QG vi {Q \ Z) follows 

ZgX ZgX 

immediately from the definitions of relation join and projection. In the other direction, consider any 

tuple i in x (2 f Z). Let v be any tuple in x (7? f Z) such that t = v| But v € /? because ^ is 

zgX ZGsg 

independent and thus R= x (7? f Z). As v G 7?, it follows that v| „ G Q. But v| „ is t, therefore t ^ Q, 

zesg 

and so X (2 I" Z) C 2. □ 

ze^r 

The next lemma is pivotal. It shows that the mincors respect independent partitions, in the sense that no 
mincor can intersect more than one element of an independent partition. 

Lemma 5 G in(S) VW G mincors(7?) 3Y G : W C Y. 

Proof: Assume the contrary. Then there is a mincor W that has nonempty intersection with more than 
one set from 5^. Suppose W has nonempty intersection with precisely t sets from ‘iX) for some t such that 
2 <t < q. Let Yi, Y 2 , ..., Y? be precisely those sets from X) that have nonempty intersection with W. 
Let W, = W n Y,, for 1 < / < t. Clearly, IJ-^j W, = W. By Lemma|4j 

7? f W = M 7? t W; 

l<i<t 

Every W, is a proper subset of W. But W is a minimal correlated set. That implies -icorr(W,), for 
1 < / < f. Apply Lemma[I]to conclude that 7? f W, = >< 7? f {x}. Then, 

jcGW,- 

7? f W = M N 7? f {x} 

l<7<f XGW/ 

Obviously, tX X 7? |" {x} = X 7? |" {x}. Then, 7? |" W = x 7? f {x}. By Lemma [T] that implies 

l<7<f xGW,- xGW xGW 

-icorr(W). □ 

Furthermore, merging mincors also yields sets that respect independent partitions. 

Corollary 1 V2) G in(S) : CC(MF(7?)) □ 

Proof: Assume the contrary. Then for some 7? on S and X) G in(S): 

3Xg CC(MF(7?)) VYg2) 3AgX:A^Y 

First note that X is not a singleton, otherwise X would be contained in some set from X). So, |X| > 2 and 
according to Definition |4j X is the union of one or more mincors, each of size > 2, and X is connected. 
But by assumption X is not a subset of any set from ^ and so there has to be some mincor W G X that 
has nonempty intersection with at least two sets from ^ . However, that contradicts Femma[5] □ 

Note that CC(MF(7?)) is not necessarily an independent partition. For example, consider 7?' defined in ([T]l 
on page[64l As explained onpage|65l MF(7?') = {{A,B}, {C,D}} and thus CC(MF(7?')) = {{A,B}, {C,D}}, 
too. But {{A,B}, {C,D}} is not an independent partition with respect to R'. In fact, there is no indepen¬ 
dent partition of S except for the trivial partition as |7?'| is a prime number. 
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Now consider another relation R" on the same scheme: 

R” = {{a\b\C\d\}, {a\b\C\d2]-, {a\b\C2d2}, {a\b2C\d\}, {a\b2C\d2}-, {a\b2C2d2]-, 

{a2b2Cidi}, {a2b2C\d2], {a2^2C2^i?2}} 

But MF(/?") = {{A,B}, {C,D}} = CC(MF(/?")) just as in the case of R'. Now {C,D}} is an inde¬ 

pendent partition with respect to R" because R” = R" \ {A,B} x R" \ {C,D}. 

So, in the case of /?", the connected components of the mincor family constitute an independent 
partition, while that is not true for R', although the mincor families of both relations are the same. We 
conclude that computing the mincor family does not suffice to obtain an independent partition. Therefore, 
we use a more involved approach in which the computation of the mincor family is but the first step 
towards the computation of the maximum independent partition. 


4.2 The meet of independent partitions 

The following lemma allows us to define the maximum independent partition as the meet of all indepen¬ 
dent partitions. 

Lemma 6 VX, 2 ) € in(S) : X n 2 ) G in(S). 

Proof: (sketch) Let X,!^ G in(S). We assume XUi^ is connected. There is no true loss of generality 
in that because the proof below can be done componentwise if X Uis not connected. Relative to an 
arbitrary element of X, say X\, we define the family 3 = {Zo,Zi,... ,Zk} over S as follows. 3 is a partition 
of S and its elements are constructed in an ascending order of the index according to the following rule: 


Zi={ 


^ 1 , 

U{A\Z,-_i|AG 2 }AAnZ,_i/ 0 }, 
[U{A\Z,-_i IA G XAAnZ,-_i / 0 }, 


if / = 0 
if i is odd 

if i is even and i > 0 


Let us define = { U;=o Zy } n X n 2} for 0 < / < A:. Clearly, Bq = {Xi} n , R, = j U ({Z,-} □ X □ 2}) 
for 1 <i<k and = X n 5^. Furthermore, = S and thus R \ UB^ = R. We prove by induction on i 
that for all i such that 0 <i <k: 


R\UBi= ^ R\C ( 5 ) 

ceBi 

and hence the result follows. 

Basis. Let i = 0. Let the elements of ^ that have nonempty intersection with Xi be called Yi, ... ,Yj. 
Obviously, there is at least one of them. The claim is that 7? f = x j^^R \ (Zi DL)- That follows 
immediately from LemmalU 

Inductive Step. Assume the claim holds for some Bi \ such that 0 < / — 1 < A: and consider R;. As already 
mentioned, Bi = U ({Z,} n X n i^). 

Without loss of generality, assume i is odd. Very informally speaking, Z, is the union of some 
elements of if) that overlap with some elements (from X) in Bi i, minus the overlap. Therefore, we can 
write Bi = U ({Z, } FI X) because under the current assumption, it is X rather than if) that dictates the 
grouping together of the elements of Z, in Bi. More specifically, since i yF k, there are elements from X 
whose elements do not appear in the current B, ; those elements of X dictate the aforementioned grouping. 
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So, Bi is the union of two disjoint sets whose elements are from namely B, i and {Z,} □ X. 

By the inductive hypothesis, B f UB,_i = x /? |" C. 

Consider {Z,} □ X and call its elements, Ti, ..., T^- Without loss of generality, consider T^. Our 
immediate goal is to prove that R \ ((UB,_i) UXi) = x R \ C. Note that is a subset of some 

ceBz-iUfri} 

F' G such that Y' has nonempty intersection with T\ itself being disjoint withB,_i. Furthermore, 

T\ is the intersection of Y' with some X' G X. X' is disjoint with otherwise the elements of Ti 

would be part of Furthermore, every element of is a subset of some element of X that is not 

X'. Let the elements of X that have subsets-elements of Bi i be Xi, ..., Xp. Note that Xi U • • • UXp = 
By LemmalU it is the case that 


R \ {XiU---UXpUTi) = R \ XiM ■■■ MR \ XpMR \ Ti 


( 6 ) 


since Ti is a subset of X' and X' is none of Xi,..., Xp. However, Xi U • • • UXp U Fi = (UB,_i) U Ti by an 
earlier observation and /? f Xi x ■ • • x /? f Xp = x R \ C. Substitute that in equation 0to obtain 


(V) 


CgB,_i 

X j (UB,_iUri) = ( N R\c]mR\Ti= tx R\C 
Vcgb,-i / CGB,_iU{ri} 

which is what we wanted to prove with respect to Fi. 

We can use Q as the basis of a nested induction. More specifically, we prove that 

X j ((UB;_i) UFiU---UF*) = ( M Xjc]><X|'Fit><---><X|'F* 

VCGfi,-! / 

implies 

X j ((UB;_i) UFiU---UF*+i) = ( tx R\c] MR\TiM ■■■ MR\Tk+i 

VceB,-i / 

for any ^G{l,2,...,m—1}. The nested induction can be proved in a straightforward manner, having in 
mind the proof of (I7]l. That implies the desired: 

X j ((UB;_i) UFiU---UF;„) = ( N R\c] MR\TiM ■■■ MR\T,n 

VCGB,-i / 

And that concludes the proof because Ufi, = UB,_i U Fi U ■ • • U F^. □ 

The proof of Lemma^relies on the fact that all sets we consider are finite. 

As a corollary of Lemma |6l the maximum independent partition, which is the object of our study, is 
well-defined: nin(S) exisfs, if is unique, and is an elemenf of in(S). For nofafional convenience we 
infroduce anofher ferm for fhaf objecf. We say fhaf 11111^(8) is the focus of R and denofe if by foc(/?). A 
frivial observation is fhaf inR(S) coincides wifh j'foc(/?). 


5 A Fixed Point Characterisation of the Maximum Independent Partition 

In fhis secfion we identify fhe objecf of our sfudy as fhe leasf fixed poinf of a, where a is a fransformer on 
fhe laffice of all parfifions of S. Furthermore, we presenf an iferafive fixed poinf approximafion procedure 
for compufing fhe maximum independenf partifion. 
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5.1 Function a 

First we introduce a helper function. Let A be a ground set. The function ^ maps superfamilies over A to 
families over A as follows. For any superfamily 

^{d) = {cz\zed} 

Syntactically speaking, ^ removes the innermost pairs of parentheses. For instance, suppose A = {a,b,c,d} 
and5' = {{{a},{Z?,c}},{{d}}}. Then (S') = {{a,b,c},{d}}. 

We now define the central function of the present study. It takes a partition of S, identifies fhe mincors 
of fhe corresponding quofienf relation, merges fhe overlapping mincors, and uses ^ fo map fhe resulf back 
fo a partition of S. 

Definition 5 (function a) : n(S) — > n(S), shortly a when R is understood, is defined as follows for 
any X E n(S).' 


a«(X) =^^(CC(MF(/?/x))) 

Nofably, a is not monotone in general as demonsfrafed by fhe following example. Lef S = {A,B, 
C,D,E} and lef each affribufe have precisely fwo values, say A = { 01 , 02 } and so on. Lef Q be fhe 
relation obfained from fhe complefe relation over S afler deleting all fuples confaining aib\Ci, all fuples 
confaining d 2 e 2 , and fhe fuples {a 2 biC\d 2 e\},{a 2 b 2 C\d 2 e\}. In ofher words. 


Q = {{a\b\C2d\e\}, {a\b\C2d\e2}, {a\b\C2d2e\}, {a\b2C\d\e\}, {a\b2C\d\e2},{a\b2C\d2e\} 

{a\b2C2d\e\}, {aib2C2d\e2}, {aib2C2d2ei}, {a2b\C\d\e\}, {a2bic\d\e2}, {a2b\C2d\e\} 

{<32^1 1 ^ 2 }, {a2biC2d2ei }, {a2b2C\d\e\}, {a2b2C\d\e2}, {<32^2C2<7iei}, {a2b2C2d\e2}, {<32^2C2<72^i}} 

Lef us see which sefs of affribules are self-correlafed wifh respecf fo Q. The only fwo-elemenf 
subsef of S fhaf is self-correlated is {D,E}. Further, {A,B,C} is self-correlated. If follows MF(2) = 
{{A,B,C},{D,E}}. Consider fhe following fwo partitions of S: X\ = {{A}, {B}, {C}, {D}, {£■}} and 
X 2 = Obviously, Xi C X 2 . It is clear that a(Xi) = {{A,B,C},{D,£}}. Con¬ 

sider a{X 2 )- The set {{B,D},{CjS}} is self-correlated because of the lack of {b\,d 2 } and {ci,e 2 } 
in any tuple, which in its turn is due to the fact that d 2 and e 2 do not occur in any tuple of R. The sets 
{{A}, {B,D}} and {{A}, {CjS}} are uncorrelated. It follows that a{X 2 ) = {{A}, {B,C,D,E}}, and thus 
a(Xi) g a(X2). 

However, we have the following property of a that shall later be exploited. 

Proposition 4 a is an inflationary function on (n(S), C). 

5.2 Independence and function a 

The following central result establishes that the independent partitions are precisely the fixed poinfs of a. 
Theorem 1 VX E n(S) : X E in(S) o a(X) = X. 

Proof: In one direction, assume X E in(S). 7?/x is complefe by Proposition [T] By definition, fhaf is 
^/X = By fhe definition of ixi, {R/X) t X = n y€x{R/X) \ {F}- It follows that -icorr(X) by 

Lemma [fl So, mincors(^/x) = 0 and MF(f?/je) = singleton5(/?/x) by Definition |4l Then CC(MF(/?/x)) = 
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{{A}|A € X}. Therefore, (CC(MF(/?/x))) = {A|A G X} = X. But (CC(MF(/?/x))) is a(X) by defini¬ 
tion. Therefore, a(X) = X. 

In the other direction, assume a(X) = X. That is, (CC(MF(f?/x))) = X, which in its turn implies 
CC(MF(/?/x)) = {{A} IA € X} because CC(MF(/?/x)) is a superfamily such that every element from S is in 
precisely one element of precisely one element of it. The remainder of the proof mirrors the above one. 
□ 

Having in mind the observation on page[69]that inR(S) coincides with tfoc(/?), we derive the follow¬ 
ing corollary of Theorem [T] 

Corollary 2 -\hc{R) is closed with respect to a. 

The following lemma says that the mincors of a quotient relation respect the focus of the relation in the 
sense that for every mincor of 7?/X, the union of its elements is a subset of some element of the focus. 

Lemma 7 VX G 4,foc(/?) VT G mincors(f?/x) 3Y G foc(/?) : UT C Y. 

Proof: Assume the contrary. That is, for some partition X that refines the focus there is a mincor T 
of R/X such that UT has nonempty intersection with at least two subsets, call them Yi and Y 2 , of the 
focus. Use Lemma[3]to conclude there is some Z d T such that |Z| = |r| and UZ G mincors(7?). Since 
|Z| = |r|, it must be the case that UZ has nonempty intersection with both Yi and Y 2 . But the focus is 
an independent partition. We derived that a mincor of R, namely UZ, intersects two distinct elements of 
an independent partition. That contradicts Lemma [5] directly. □ 

We already established (see Proposition ID) that a is an inflationary function. The next lemma, however, 
establishes a certain restriction: the application of a on a dependent partition can yield another dependent 
partition or at most the focus, and never an independent partition “above” the focus. 

Lemma 8 |foc(/?) is closed with respect to a. 

Proof: We prove that VX G 4,foc(/?) : a(X) U foc(/?). Recall that a(X) is a partition of S and it abstracts 
X. Assume the claim is false. Then there is a partition X such that X U 4,foc(/?) but a(X) ^ |foc(R). Then 
there is some P G a(X) such that P has nonempty intersection with at least two elements, call them Yi 
and Y 2 , of foc(R). However, P is (C) for some C that is a connected component—^relative to the ground 
set X—of the mincor family of R/X- Consider C. It is the union of one or more mincors of R /X, those 
mincors being subsets of X. 

Since X U foc(R), no element of X can intersect both Yi and Y 2 . It follows that at least one mincor 
M G C is such that UM intersects both Y 1 and Y 2 . But that contradicts Lemma |7] □ 

The next and final central result allows us to compute the focus of R by an iterative application of a, 
starting with the partition into singletons. 

Theorem 2 For some m such that 1 < m < |S|, o;'”(_L) = foc(R). 

Proof: Consider the sequence: 

C = _L, a(-L), a^(_L), ... 

It is a chain in the lattice (n(S), □), as a(X) abstracts X for all X (see Proposition IHl, therefore all those 
elements are comparable with respect to U. C has only a finite number of distinct elements as the said 
lattice is finite. 

First note that every element of C is in 4,foc(/?). Indeed, assuming the opposite immediately contra¬ 
dicts Lemma [H 


72 


Self-Correlation and Maximum Independence 


Then note that for every X G |foc(7?) \ {foc(7?)}, it is the case that a(X) 7 ^ X. Assuming the opposite 
implies X is a fixed point of a, contradicting Corollary |2l Proposition |4]implies a stronger fact: for every 
X € 4,foc(7?) \ {foc(7?)}, it is the case that X C o;(X). But |foc(7?) is a finite lattice. It follows immediately 
that for some value m not greater than |S|, a'”{X) equals the top of 4-foc(7?), viz- loc{R). □ 

We thus obtain Kleene’s iterative least fixed point approximation procedure fT\, however for inflationary 
functions instead of monotone ones. 

Corollary 3 The following algorithm: 

X^X 

while X / a (X) 

X ^ a(X) 

return X 

computes the least fixed point of a, i.e., the maximum independent partition ofS with respect to R. □ 

Here is a small example illustrating the work of that algorithm. Consider S and R' defined in ([I]) on 
pagelMl -L is {{A}, {B}, {C}, {D}}. Let us compute o;(±), that is, i^(CC(MF(7?Y_l)))- ^V-L the same 
as R' 1 X 2 ori pagelMl namely: 

{{a2}{b2}{ci}{di}},{{a2}{b2}{c2}{d2}}} 

Let us compute CC(MF(f?7x))- Having in mind that MF(/?') = {{A,B}, {C,D}} as explained on pagein 
conclude that CC(MF(7?7x)) = {{{A,B}}, {{C,D}}}. Therefore, i^(CC(MF(7?7x))) = {{A,B}, {C,D}}. 
That differs from X and the while loop is executed again. R'/a{X.) is the same as ^7Xi on page IMl 
namely: 


R'/a{X) = {{{a\b\}{c\di}},{{aib\}{c2d2}},{{aib2}{c\d2}}, 

{{a2b2}{c\d]}},{{a2b2}{c2d2}}} 

Let us compute CC(MF(7?7«(X)))- To that end, note that o;(X) = {{A,B},{C,D}} is self-correlated 
with respect to ^7{{A,B},{C,D}} because of the lack of, for instance, both {< 31 ,^ 2 } and {c\,d\} in 
any tuple of R'/a{l)- It follows that CC(MF(7?7«(X))) = {{{A,B},{C,D}}} and, therefore, «7 t) = 
i^(CC(MF(/? 7«(X)))) = {{A,B,C,D}}. That differs from o;(X) and the while loop is executed once 
more. At the end of that execution, it turns out that a^(±) equals o;^(X) and the algorithm terminates, 
returning as the result {{A,B,C,D}}, the trivial partition. 

6 Related Work 

An algorithm that factorizes a given relation into prime factors is proposed in ifTOl algorithm Prime 
Factorization]. It runs in time 0{mn\gn) where m is the number of tuples and n is the number of 
attributes. Since mn is the input size, that time complexity is very close to the optimum. The theoretical 
foundation of Prime Factorization is a theorem (see ifTOl Proposition 10]) that says a given relation 
S has a factor F iff, with respect to any attribute A and any value v of its domain, F is a factor of both Q 
and R where Q and R are relations such that Q\JR = S and Q consists precisely of the tuples in which 
the value of A is v. In other words, the approach of ifTOll to the problem of computing the prime factors 
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is “horizontal splitting” of the given relation using the selection operation from relational algebra. The 
approach of this paper to that same problem is quite different. We utilise “vertical splitting”, using the 
projection operation of relational algebra. The theoretical foundation of our approach is based on the 
concept of self-correlation of a subset of the attributes; that concept has no analogue in ifTOll . 

An excellent exposition of the benefits of the factorisation of relational data is ifTTl . The factorised 
representation both saves space, where the gain can potentially be as good as exponential, and time, 
speeding up the processing of information whose un-factorised representation is too big. [HI proposes a 
way of decomposing relational data that is incomplete and ifT^ proposes factorisation of relational data 
that facilitates machine learning. 

Clusterisation of multidimensional data into non-intersecting classes called clusters is an important, 
hard and computationally demanding problem. ||5l investigates clustering in high-dimensional data by 
detection of orthogonality in the latter. fSi] proposes so called community discovering, which is a sort of 
clusterisation, in media social networks by utilising factorisation of a relational hypergraph. 

The foundation of this paper is the work of Gurov et al. that investigates relational factorisation 
of a restricted class of relations called there simple families. ||6l introduces the concept of correlation 
between the attributes and proposes a fast and practical algorithm that computes the optimum factorisa¬ 
tion of a simple family by using a subroutine for correlation. The fundamental approach of this paper is 
an extension of that, however now correlation is considerately more involved, being not a binary relation 
between attributes but a relation of arbitrary arity (this is the only place where “relation” means relation 
in the Set Theory sense, that is, a set of ordered tuples). 

7 Conclusion 

This paper illustrates the utility of fixed points to formally express maximum independence in relations 
by means of minimum correlated sets of attributes. By using minimum correlated sets, we define an 
inflationary transformer over a finite lattice and show the maximum independent partition is the least 
fixed point of this transformer. Then we prove the downward closure of that least fixed point is closed 
under the transformer. Hence, the least fixed point can be computed by applying the transformer itera¬ 
tively from the bottom element of the lattice until stabilization. This iterative construction is the same as 
Kleene’s construction, but does not rely on monotonicity of the transformer to guarantee that it computes 
the least fixed point. 

A topic for future work is to introduce a quantitative measure for the degree of independence between 
sets of attributes and investigate approximate relational factorisation. 
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